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Abstract 

Suppose that a solution x to an underdetermined linear system b = Ax is given, x is approximately sparse meaning 
that it has a few large components compared to other small entries. However, the total number of nonzero components 
of X is large enough to violate any condition for the uniqueness of the sparsest solution. On the other hand, if only 
the dominant components are considered, then it will satisfy the uniqueness conditions. One intuitively expects that x 
should not be far from the true sparse solution xq. We show that this intuition is the case by providing an upper bound 
on ||x - xqII which is a function of the magnitudes of small components of x but independent from xq. This result is 
extended to the case that b is perturbed by noise. Additionally, we generalize the upper bounds to the low-rank matrix 
recovery problem. 

Keywords: Approximately sparse solutions, low-rank matrix recovery, restricted isometry property, sparse vector 
recovery 


1. Introduction 

Let xo e R™ denote a sparse solution of an underdetermined system of linear equations 

b = Ax (1) 

in which b e R" and A 6 > n. Suppose that ||xo||o = k, where ||xo||o designates the number of nonzero 

components or the fo norm of xq. Further, let spark(A) represent the spark of A, defined as the minimum number of 
columns of A which are linearly dependent, and let 62 k(^) denote the restricted isometry constant of order 2k for the 
matrix A ll]]. It is well known that if k < spark(A)/2 or d 2 tr(A) < 1, then xq is the unique sparsest solution of the 
above set of equations idll. 

When the sparsest solution of O is sought, one needs to solve 

min||x||o subject to Ax = b. (2) 

X 

However, the above program is generally NP-hard and becomes very intractable when the dimensions of the 
problem increase. Since finding the sparse solution of ([TJ has many applications in various fields of science and 
en^eering (cf. |01 for a comprehensive list of applications), many practical alternatives for (|2]i have been proposed 
ISl-lsl]. If the solution obtained by these algorithms satisfies one of the above sufficient conditions, then, assuredly, this 
solution is the sparsest one. 

Now, consider the case that the solution given by an algorithm is only approximately sparse meaning that it has 
some dominant components, while other components are very small but not equal to zero. If the total number of 
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nonzero components is large such that neither of the mentioned conditions hold, it is not clear whether this solution is 
close to the true sparse solution or not. However, intuitively, one expects that if the number of effective components is 
small, then the obtained solution should not be far away from the true solution. Immediately, the following questions 
may be raised. Is this solution still close to the unique sparse solution of b = Ax? Is it possible in this case to establish 
a bound on the error of finding xq without knowing xq? Similar questions can be asked when there is error or noise in 
([TJ. Taking the noise into account, ([TJ is updated to 


b = Ax + e, (3) 

where e is the vector of noise or error. In this setting, to estimate xq given b and A, the equality constraint in ^ is 
relaxed, and the following optimization problem should be solved; 

min||x||o subject to ||Ax-b||<e, (4) 

X 

where e > ||e|| is some constant and || ■ || designates the €2 norm. 

The answers to the above questions were firstly given in |0. Let x denote the output of an algorithm to find or 
estimate xq from O or (l3]l. Particularly, provides two upper bounds on the error ||xo - x||. The first one is rather 
simple to compute but turns out to be loose. On the other hand, while the second bound is tight, generally, it is much 
more complicated to compute. 

Herein, in the spirit of the loose bound in i^, we provide a better bound which is based on the same parameter 
of the matrix A, but it is strictly tighter than the loose bound in J^. Moreover, our proposed bound is obtained in a 
much simpler way with a shorter algebraic manipulation. The proposed bound is extended to the noisy setting defined 
in ©. Furthermore, these results are also generalized to the problem of low-rank matrix recovery from compressed 
linear measurements US. 

The bounds introduced in this paper can be used in analyzing the performance of algorithms in sparse vector and 
low-rank matrix recovery, especially those algorithms that provide approximately sparse or low-rank solutions such as 
irtl and |[ni[il. Other algorithms, under some conditions, can also benefit from the analysis presented in this paper. It 
is known that the solution obtained by some numerical solvers of basis pursuit Q, like -magic S, is not usually 
exactly sparse. In fact, due to limited numerical accuracy, the obtained solution has some very small nonzero entries. 
Our results can be used to find upper bounds on the £2 norm of this kind of errors. Finally, when greedy algorithms 
|@] are used with an overestimated number of nonzero components of the true solution, our bound can be exploited to 
characterize the conditions under which the given solution is close to the true one. However, the bounds are obtained 
without any assumption on the recovery algorithm, and it is possible to improve them by exploiting properties of a 
specific algorithm. A similar upper bound on the error of sparse recovery in the noisy case has been proposed in Q. 
This upper bound, however, is only applicable when the given solution has a sparsity level, the number of nonzero 
components, not greater than that of the true solution, while our bounds are obtained under the opposite assumption 
on the sparsity level of the given solution. 

The rest of this paper is organized as follows. After introducing the notations used throughout the paper, in Section 
12] we first present the upper bounds on the error of sparse vector recovery and, next, generalize them to the low-rank 
matrix recovery problem. Section |3] is devoted to the proofs of the results in Section |2| followed by conclusions in 
Section|4| 

Notations: For a vector x, ||x||, ||x||i, and ||x||o denote the { 2 , ti, and the so-called {q norms, respectively. Moreover, 
x-'- denotes a vector obtained by sorting the elements of x in terms of magnitude in descending order, and x, designates 
the ;th component of x. X/ represents the subvector obtained from x by keeping components indexed by the set 
I. A vector is called k-sparse if it has exactly k nonzero components. For a matrix A, a, denotes the ith column. 
Additionally, spark(A) and null(A) designate the minimum number of columns of A that are linearly dependent and 
the null space of A, respectively. Similar to the vectors. A/ represents the submatrix of A obtained by keeping those 
columns indexed by I. It is always assumed that the singular values of matrices are sorted in descending order, and 
cr,(X) denotes the /th largest singular value of X. Let X = cTiU/vf, where q — rank(X), denote the singular value 
decomposition (SVD) of X. X(r) = TjUi represents a matrix obtained by keeping the r first terms in the SVD 

of X, and X(_r) = X - X(r). ||X||/r denotes the Frobenius norm, and ||X||, = cr/(X), in which q - rank(X), stands 

for the nuclear norm. 
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2. Upper Bounds 

In this section, the upper bounds on the error of sparse vector and low-rank matrix recovery are presented. 

2.1. Sparse Vector Recovery 

Following the common practice in the literature of compressive sensing (CS), we refer to b, A, and e in Q as 
the measurement vector, sensing matrix, and noise vector, respectively. Before stating the results, we recall two 
dehnitions. 

Definition 1 (jlll). For a matrix A 6 and all integers k < m, the restricted isometry constant (RIC) of order k is 
the smallest constant 6k{A.) such that 


(1 - 5i.(A))||x||2 < ||Ax||2 < (1 + 5,(A))||x||2 (5) 

holds for all vectors x with sparsity at most k. 

Definition 2 (1^). For a matrix A e let crnjin p(A) > Q for p < spark(A) - \ be the smallest singular value of 

all possible nx p submatrices of A.. 

The following theorem presents the upper bounds for both noisy and noiseless cases. We deliberately separate the 
noisy and noiseless cases in order to be able to provide a tighter bound in the noiseless setting. 

Theorem 1. Let A 6 R"^™, m > n, denote a sensing matrix. We have the following upper bounds. 

• Noiseless case: Suppose thatXQ is a k-sparse solution of Ax — b, where k < spark(A)/2. For all x solutions of 
Ax — b satisfying 4^1 < a, 


||xo-x||2< l+(m-2k) 


max, 


,l|a,l|2\ 


0“-. t,(A) 

mm,2/:^ ^ 


{m - 2k)a^. 


( 6 ) 


• Noisy case: Let xq be any arbitrary vector with ||xo||o — k < spark(A)/2, and let b = Axq -i- e, where e is noise 
with ||e|| < e. For all x vectors satisfying ||b — Ax|| < A < a, the error ||xo — x|| is bounded by 


iixo -?ii <( 1 + cr 

\ Crn,in,2y(:(A) / 

-H ^ 

O' min,2t:(A) 


(7) 


In brief, the above bounds say that if we have a solution x that consists of k large components, then this vector 
is not far from the sparse solution provided that crniin, 2 i:(A) is not very small. In particular, the bound in ® vanishes 
when X is k-sparse, reducing to the well-known uniqueness theorem in || 2 l. Moreover, notice that these bounds work 
uniformly for all sparse vectors xq of sparsity level k; that is, they are independent from the position and magnitude 
of nonzero component of xq. 

Remark 1. The loose bounds in Theorems 2 & 4] translated to our notations in the noiseless and noisy settings 
are 

||xo -^1 < fl +- ^——)ma, 

v 0-mm,2k(A)/ 

||xo -xll < [l -H-Ua-H ^'^4. 


( 8 ) 

(9) 
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The bounds in ([8]) and (|9]) are applicable only if the sensing matrix has unit {2 norm columns, whereas Theorem[T]is 
valid without this restriction. To compare our bounds in Theorem[T]to ® and (|9]l, let U denote the square root of the 
upper bound in (|6l). Substituting max, ||a,|| with 1 in U, one can write that 




t/ = J| 1 + r j(OT - 2 k) a 


< 1 + 


min,2/: 

y/m - 2k 

O" min,2/t(A) 


(A)^ 

jVm — 2k a — U 2 


' Vm — 2k O"min .2t:(A) / 


< 1 + 


— 2 k cr min, 2t:(A) 

j(m — 2 k)a 


{m — 2k)a 


O"min,2/t(A) ) 


< fl +- 

\ Crmin,2t(A)/ 


where U 2 is the first term in the upper bound in (|7]i with max, ||a^ = 1. The above inequalities prove that the bounds 
(|6]l and (|7]i are strictly tighter than the corresponding bounds in Igt] formulated in (l8]l and ® . 

Remark 2. In general, finding crmin, 2 <:(A) is a combinatorial problerrQ and NP-hard jj. However, for a random 
matrix A, under some conditions, the smallest singular value of all n x2k submatrices is highly concentrated around 
a certain value. In particular, let A( 2 k) denote any n x2k submatrix of A. If all the entries of A are independent and 
identically distributed (iid) from a normal distribution N(0, and 2k < n, then for any f > 0, we have [@] 




-«£ 
^ e 2 , 


where p{-} and (Jmmi.') denote the probability of the event described in the braces and the smallest singular value, 
respectively. This shows that when the dimensions of A increase, the smallest singular value of all nx2k submatrices 

is equal to or larger than 1 - with very high probability. In line with this, for any matrix with iid entries from 

a zero-mean, ^-variance distribution with a finite fourth-order moment, when n,m ^ 00 while ^ ^ c, crmm{\ 2 k)) 
converges to 1 - a/c almost surely 

Remark 3. In addition to the above probabilistic values for crmm, 2 /t(A), the bounds in Theorem[T]can be also stated 
in terms of b 2 i:(A) instead of crmin, 2 t:(A). In fact. 


o'min,2/t(A) = min 


IIAxll 


Nlo<2^ ||x|| 


or||Ax||2 > crT^ 2 J:(A)||x|P for all x with sparsity at most 2k. Since (52i:(A) in Q is in such a way that both inequalities 
are satisfied, it can be concluded that ^ “‘^ 2 /t(A). Consequently, the following bounds, under the condition 

b 2 A:(A) < 1, are a reformulation of the bounds in Theorem[T]in terms of b 2 jr(A) which is frequently used in CS literature. 


• Noiseless case: 

||xo - x||2 < ^1 -t- (m - 2k)^^j^^^{in - 2k)a^. 


• Noisy case: 

iixo - xii <( 1 + ^ 

\ VI -b2;.(A)/ 

A + e 

^ V1-c52^(A)' 


^ Since one should calculate the singular values of all (^’l) possible nx2k submatrices of A. 
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2.2. Low-rank Matrix Recovery 

Recovery of a low-rank matrix from compressed linear measurements 113 is the task of finding the low-rank 
matrix Xq 6 from underdetermined measurements b = J?l(Xo) where b e ^ R™ is a linear 

operator, and m < uiU 2 - In the presence of noise, the measurement model is changed to b = -^(Xq) -h e where e is 
the vector of noiseH This recovery is a generalization of sparse vector recovery introduced in Section [1] to matrix 
variables. Consequently, the naive approach for recovering Xq from either noiseless or noisy measurements is 

minrank(X) subject to ||,^(X) - b|| < e, (10) 

where e is some constant not less than ||e|| in the noisy case and equal to 0 in the noiseless case. 

In this subsection, we present upper bounds on the error of recovering or estimating low-rank matrices from 
noiseless and noisy measurements when the obtained solution is approximately low-rank. Similar to the vector case, a 
matrix is approximately low rank, if it is composed of a few dominant singular values, while its other singular values 
are very small. Before stating the results, first the definition of the RIC is recalled. 

Definition 3 ( lll3 ). For a linear operator : R"'^"- R™ and all integers r < min(ni, n 2 X the RIC of order r is the 
smallest constant dflU) such that 

(1 - drimwMl ^ ii^(x)ii" < (1 + drimwMl 

holds for all matrices X with rank at most r. 

Theorem 2. Let IR : R"'^"- —> R”, m < nin 2 , denote a linear operator, and let n — mm{n\,n 2 ). We have the following 
upper bounds. 

• Noiseless case: Suppose that Xq is a rank r solution o/b = I71(X). IfO < 62 ,(IR) < 1, then, for all X solutions 
ofh — IR(X) satisfying (t,.+i(X) < a, 

||X„ - Ml < (1 + (« - - 2r)a\ (11) 


• Noisy case: Let Xq be any arbitrary matrix of rank r, and let b = lR(Xci) -H e, where e is noise with ||e|| < e. If 
0 < (52r(‘R) < 1, then for all X estimates ofXo satisfying ||b — IR(X)\\ < A and crr+i(X) < a, the error ||Xo — X|| 
is bounded by 


l|Xo-X||f < 


1 -H R(n-2r) 


l-hdiilR) 

1 - 62 rm 


Vn — 2r a 


A -H e 

^ l-b2.(^)' 


( 12 ) 


3. Proofs of Results 

3.1. Proof of Theorem\I\ 

We need the following lemmas. 


Lemma 1. Let A € R"^“, m > n, be a sensing matrix. For every x 6 null(A) and any subset / o/{l, • • • ,m} with 
cardinality m — p, where p < spark(A) — 1, we have that 


l|x||' < ( 


\ {m — p) 


max,- ||a, 
cr^ ■ (A) 




(13) 


^The parameters b, m, e, and n (to be defined later on in this subsection) should not be confused with the similar parameters defined in Subsection 


O 
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Proof: First, we notice that 


IZ 



^ iel 

(Zwf' 


iel iel 

< max I la, 


= max||a,||^||x/||2, 
i 

< (m-/7)max||a/||^||x/||^, 

i 


(14) 


where, for the last inequality, we used'iz 6 ||z||j < /||z||^ iI3/- Next, from Ax = + Yjiti^i^i — 0> 8 ^^ 

I ^ Xia\f = ||A;x;|| 2 > cr^i„,/A)||x;||2, (15) 

iel 

where I — {1, • • • ,m]\I. Combining inequalities (Ell and (I15l l and using ||x|p = ||x/||^ + ||x/|p prove (1131) . Note that 
p < spark(A) — 1 implies that crjj,in ^(A) 0 and inequality (1131) is not trivial. ■ 

Lemma 2. Let A. e m > n, be a sensing matrix. For every x satisfying ||Ax|| < q and every subset I of {I, ■ • ■ ,m] 

with cardinality m — p, where p < spark(A) — 1, we have that 

||x|| < (l + H 

Proof: Similar to the proof of Lemma\I\ we have 

II 2 HI- Vm - p max I la,' 

iel 

Furthermore, from Ax = Zie/ + YjHI Xi^i, we get 

I ^ ;c;a,|| > IIA/X/H - ||Ax||, 




(X min,n(A) 


(16) 


(17) 


> 0-min,p(A)||X/|| - ||Ax||, 

> Crn,in,p(A)||X/|| - q. 


(18) 


Combining inequalities (0 and (HI leads to 


' min,/7 


,p(A)||x/H < V'«-pmax||a,||||x/|| + q 


which is equivalent to 


l|x/|| + ||x/H < (l + 

V Crniin,p(A)/ 


tX min,/>(A) 


The above inequality together with 


l|x|| = 


fx/ 

< 

X/ 

+ 

10] 

Ix/ 


0 


Ix/J 


llx/ll + llx/ll, 


where 0 is a vector of zeros of appropriate length, proves (116b . 


Proof of Theorem\I} To prove (|3, we first notice that because xq has k nonzero components and < a, 
X = xq - X has at most 2k components with magnitude larger than a. Alternatively, x possesses at least m - 2k 
components with magnitude not greater than a. Now, let I denote a set of indexes of components of x with magnitude 
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less than or equal to a such that \I\-m- 2k. It is clear that ||x/|p <{m- 2k)a^. Consequently, since x e null(A), we 
can apply Lemma[T]to get 


For proving (l7]i, we start with 


||X()-xlP < ^1 +{m-2k) 
< (l +(m-2k) 


max,- ||a/|p 
cr^ . ,,,(A) 
max; ||a,|p 

0-2. ,j(A) 


)||X/||^ 

j(m - 2k)cP'. 


||A(xo - x)|| = ||b - Ax + Axo - b||, 

< lib - Axil + IIAxo - b||, 

<A + e. (19) 

Following the same reasoning as in the proof of (|6]l, the application of Lemma|2]proves 0. ■ 


3.2. Proof of Theorem^ 

Lemma 3. Let —» R'",m < n\n 2 , denote a linear operator. For every r < n — mm{n\,n 2 ) and every 

X e null(,^), ifO< < 1, then 


||X||2 <(l+(u-r)i^^)||X(_,)||2. 
Proof: Let X = YIi=\ o‘/U/vf denote the SVD of\. We can write that 

||^(X(-.))f - ||^( J Cr;U;vf)f, 

i=r+l 

= I ^ 0-;,;?[(U;vJ')|| , 

/=r-f 1 

< ( ^ tr;||^(U;v[)||) , 

i=r+l 

S’(2 ^,vrT7;Mf. 

/=r+l 

= (l+5i(J?[))||X(_,)||', 

< (n - r)(l + 5i(,^))||X(_r)||^, 


( 20 ) 


( 21 ) 


where (a) follo ws f rom the definition of the RIC and = 1 and for (b), we used the inequality ||Y||» < 

Vrank(Y)||Y||f 


Additionally, J?[(X) = S?[(X(r)) H- ,^(X(_r)) = 0 implies that 

||j?[(X(_,))|f = ||^(X(,))|f > (l - <5,(^))||X(,)|| 


( 22 ) 


Combining (1211) and (122b together with ||X||2 = ||X(r)||^ -I- ||X(_r)||2 leads to inequality (120b . 


Lemma 4. Let R'”,m < n\n 2 , denote a linear operator. For every r < n — min(«i,n 2 ) cind every X 

satisfying ||S?[(X)|| < q, ifO < < 1, then 


llXllf < 


1 + ^{n - r) 

'^\-drm' 
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l+di(^) 


1 - d,m 


l|X(-,)||r 


( 23 ) 













Proof: Inequality (1211) holds for every X; thus, it is possible to write 

||J?[(X(_,))|| < V(«-r)(l+5i(J?[))||X(_,)||f. 


( 24 ) 


Furthermore, applying the triangle inequality on J?[(X(_r)) = -^(X) — one can obtain 

||j?[(X(_,))|| > ||^(X(,))|| - ||^(X)||, 

> Vl-^.(^)l|Xwl|f-^. (25) 

Combining inequalities (124b and (125b together with ||X||f < ||X(r)||F + ||X(_r)llF gives inequality (123b . ■ 

Proof of Theorem^ To prove (fTTb . let us first define X = Xq - X. According to ifl^ Thmeorem 3.3.16], for 
any 1 < i,j < n and / + j < n + 1, 

cr,+^_l(X) < cr/(Xo) + 0-j(X). 

Substituting i and j with r + 1 in the above inequality leads to 

a-2r+\{X) < crr+i(Xo) + crr+i(X) < a. 

Consequently, Lemma[3]implies that 

||X„ - X||2 < (l + („ - 2r)|^^^)||X(_2.)l&, 

For proving (fT2t . we start with 

||A(Xo - X)|| = lib - Jl(X) + ^(Xo) - b||, 

< A + e. 

Following the same reasoning as in the proof of (fTTb . the application of Lemma|4]completes the proof. ■ 

4. Conclusion 

In this paper, we proposed upper bounds on the error of sparse vector recovery from both noiseless or noisy mea¬ 
surements when the obtained solution is approximately sparse. While these bounds are based on the same parameters 
as in the loose bounds of i^, they are strictly tighter. We further generalized them to the problem of low-rank matrix 
recovery, when the solution at hand to recover the true low-rank matrix is approximately low rank. 
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